Scalable biomedical Named Entity Recognition: investigation of a database-supported SVM approach

نویسندگان

  • Mona Soliman Habib
  • Jugal K. Kalita
چکیده

This paper explores scalability issues associated with the Named Entity Recognition problem in the biomedical publications domain using Support Vector Machines. The performance results using existing binary and multi-class SVMs with increasing training data are compared to results obtained using our new implementations. Our approach eliminates prior language or domain-specific knowledge and achieves good out-of-the-box accuracy measures comparable to those obtained using more complex approaches. The training time of multi-class SVMs is reduced by several orders of magnitude, which would make support vector machines a more viable and practical solution for real-world problems with large datasets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features

Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...

متن کامل

Biomedical Named Entity Recognition Using Support Vector Machines: Performance vs. Scalability Issues

This paper examines the performance and scalability of Named Entity Recognition (NER) using multi-class Support Vector Machines (SVM) and high-dimensional features. The NER domain chosen for these experiments is the biomedical publications domain, especially selected due to its importance and inherent challenges. We use a simple machine learning approach that eliminates prior language knowledge...

متن کامل

A Generic Classifier-Ensemble Approach for Biomedical Named Entity Recognition

In named entity recognition (NER) for biomedical literature, approaches based on combined classifiers have demonstrated great performance improvement compared to a single (best) classifier. This is mainly owed to sufficient level of diversity exhibited among classifiers, which is a selective property of classifier set. Given a large number of classifiers, how to select different classifiers to ...

متن کامل

Addressing Scalability Issues of Named Entity Recognition Using Multi-Class Support Vector Machines

This paper explores the scalability issues associated with solving the Named Entity Recognition (NER) problem using Support Vector Machines (SVM) and high-dimensional features. The performance results of a set of experiments conducted using binary and multi-class SVM with increasing training data sizes are examined. The NER domain chosen for these experiments is the biomedical publications doma...

متن کامل

Improvement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination

Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracte...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • International journal of bioinformatics research and applications

دوره 6 2  شماره 

صفحات  -

تاریخ انتشار 2010